Automated unsupervised authorship analysis using evidence accumulation clustering
نویسندگان
چکیده
منابع مشابه
Automated unsupervised authorship analysis using evidence accumulation clustering
Authorship Analysis aims to extract information about the authorship of documents from features within those documents. Typically, this is performed as a classification task with the aim of identifying the author of a document, given a set of documents of known authorship. Alternatively, unsupervised methods have been developed primarily as visualisation tools to assist the manual discovery of ...
متن کاملEfficient Unsupervised Authorship Clustering Using Impostor Similarity
Some real-world authorship analysis applications require techniques that scale to thousands of documents with little or no a priori information about the number of candidate authors. While there is extensive research on identifying authors given a small set of candidates and ample training data, almost none is based on real-world applications of clustering documents by authorship, independent o...
متن کاملData Clustering Using Evidence Accumulation
We explore the idea of evidence accumulation for combining the results of multiple clusterings. Initially, n d−dimensional data is decomposed into a large number of compact clusters; the K-means algorithm performs this decomposition, with several clusterings obtained by N random initializations of the K-means. Taking the cooccurrences of pairs of patterns in the same cluster as votes for their ...
متن کاملWeighted Evidence Accumulation Clustering Using Subsampling
We introduce an approach based on evidence accumulation (EAC) for combining partitions in a clustering ensemble. EAC uses a voting mechanism to produce a co-association matrix based on the pairwise associations obtained from N partitions and where each partition has equal weight in the combination process. By applying a clustering algorithm to this co-association matrix we obtain the final data...
متن کاملEvidence Accumulation Clustering using Pairwise Constraints
Recent work on constrained data clustering have shown that the incorporation of pairwise constraints, such as must-link and cannot-link constraints, increases the accuracy of single run data clustering methods. It was also shown that the quality of a consensus partition, resulting from the combination of multiple data partitions, is usually superior than the quality of the partitions produced b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Natural Language Engineering
سال: 2011
ISSN: 1351-3249,1469-8110
DOI: 10.1017/s1351324911000313